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A  new  method  for  extracting  partial  representations  of  a  program  is  described. 
Given  two  sets  of  variable  instances,  source  and  sink,  a  g  aph  is  constructed  showing 
the  statements  that  cause  definitions  of  source  to  affect  uses  of  sink.  This  criterion 
can  express  a  wider  range  of  queries  than  the  various  forms  of  slice  criteria,  which  it 
subsumes  as  special  cases.  On  the  standard  slice  criterion  (backward  slicing  from  a 
use  or  definition)  it  produces  better  results  than  existing  algorithms. 

The  method  is  modular.  By  treating  all  statements  abstractly  as  def-use  relations,  it 
can  present  a  procedure  call  as  a  simple  statement,  so  that  it  appears  in  the  graph  as 
a  single  node  whose  role  may  be  understood  without  looking  beyond  the  context  of 
the  call. 
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1  Introduction 


No  task  that  involves  existing  code  -  whether  debugging,  adaptation  or  restructuring  -  can 
begin  until  the  developer  understands  it.  Knowing  all  the  details  of  how  and  why  the  code 
works  is  almost  never  necessary;  an  experienced  developer  will  try  to  extract  only  just  enough 
information  to  perform  the  task  at  hand. 

Many  of  the  questions  that  arise  when  a  developer  faces  unfamiliar  code  are  partial  in  two 
respects.  First,  they  concern  the  relationships  between  program  components  rather  than  their 
values.  That  a  procedure  modifies  a  global  counter,  for  instance,  is  likely  to  matter  more  than 
whether  it  increments  or  decrements  it;  and,  if  values  do  matter,  they  are  usually  determined 
easily  once  the  relationships  are  clear.  Second,  questions  tend  to  focus  on  some  components 
and  ignore  others.  A  developer  might  want  to  sec,  for  instance,  only  the  statements  that  read 
or  write  some  variable,  or  those  that  cause  one  variable  to  acquire  its  value. 

Program  slicing  is  a  focusing  technique  based  on  dependence  relationships  that  seems  well 
suited  to  reverse  engineering.  Very  roughly,  a  slice  of  a  program  is  a  skeleton  obtained  by  de¬ 
leting  all  statements  that  do  not  affect  the  value  of  a  given  variable  at  a  given  line  [Wei84]. 
Slicing  was  originally  devised  for  debugging,  where  its  utility  is  easily  seen:  if  the  value  of  that 
variable  at  that  line  is  wrong,  the  code  that  corrupted  it  (but  not  necessarily  the  bug  [WL86]) 
must  be  within  the  slice,  which  may  be  much  smaller  than  the  original  program. 

The  utility  of  slicing  for  reverse  engineering,  however,  is  less  obvious.  Not  all  questions  - 
even  when  restricted  to  the  vocabulary  of  program  dependences  -  can  be  cast  as  slice  criteria. 
How  a  variable  affects  other  variables,  for  example,  is  a  different  question  from  how  it  is  af¬ 
fected  by  others,  and  calls  for  a  different  analysis  (sometimes  referred  to  as  “forward  slicing” 
in  contrast  to  standard  “backward”  slicing). 

Furthermore,  when  applied  to  reverse  engineering,  slicing  can  give  disappointing  results. 
First,  slices  often  turn  out  to  be  too  large  to  be  useful.  Sometimes  this  is  due  to  the  limitations 
of  static  analysis  and  a  smaller  slice  exists  even  though  it  cannot  be  found.  But  more  com¬ 
monly,  the  slice’s  focus  is  too  broad.  The  user  may  want  to  understand  only  how  a 
procedure’s  result  is  obtained  from  its  arguments  and  not  from  globais,  say,  or  vice  versa.  Slic¬ 
ing  cannot  discriminate  origins,  so  every  statement  that  affects  the  given  variable  will  be 
drawn  in,  whatever  the  source  of  its  dependence. 

Second,  procedure  call  is  troublesome.  Programs  are  easier  to  understand  when  proce¬ 
dures  are  examined  one  at  a  time.  Interprocedural  slicing  [HDC88,  HRB90,  Bin93]  treats 
procedure  call  as  a  linkage  mechanism  rather  than  an  abstraction  barrier,  and  includes  state¬ 
ments  in  the  slice  from  within  the  called  procedure.  As  a  result,  to  understand  why  a  proce¬ 
dure  call  appears  in  a  slice,  the  user  must  look  inside  to  see  which  of  its  statements  were  re¬ 


sponsible.  Worse,  the  natural  slice  criteria  associated  with  a  procedure  call  are  not  express-  _ 
ible.  One  cannot,  for  example,  easily  specify  a  slice  on  the  use  of  a  global  x  by  the  call  since  __ 
the  use  of  x  will  occur  at  a  node  inside  the  procedure  (which  the  user  will  have  to  find)  and 
not  at  the  calling  node. 

We  have  developed  an  analysis  similar  to  slicing  that  aims  to  overcome  these  problems.  In 
place  of  the  slice  criterion,  the  user  specifies  two  sets  of  variable  instances,  source  and  sink-, 
the  analysis  then  identifies  the  statements  that  cause  source  to  affect  sink.  This  allows  a  wider  “ 
variety  of  questions  to  be  formulated,  of  which  the  various  kinds  of  slicing  emerge  as  special  _ 


Our  analysis  is  based  on  a  variant  of  the  program  dependence  graph  that  treats  nodes  in  a  — — 

Mv^iiuoiiity  Codes 

Avail  and  /  or 
Di't  Special 


program  Sum 
s~0 
X  “  1 

u/hile  X  <  11  do 
s  *5  +  x 
X-X+  2 

end 

end 


(  entry  ) 


Figure  1:  A  program  and  its  PDG 


more  abstract  and  uniform  fashion.  Instead  of  having  a  node  for  every  simple  assignment,  and 
special  nodes  for  procedure  call  and  return,  etc.,  we  model  each  node  as  a  def-use  relation. 
This  allows  a  modular  analysis  in  which  a  procedure  call  appears  as  a  statement  like  any  other, 
but  with  a  role  determined  by  the  dependences  of  its  body. 

Following  the  deconstructionist  tendency  of  this  line  of  research,  we  call  our  analysis  chop¬ 
ping  and  have  implemented  it  in  a  tool  called  Chopshop.  The  paper  starts  by  explaining  the 
foundations  of  the  tool  in  the  abstract  program  dependence  graph  and  how  it  differs  from  the 
conventional  dependence  graph.  It  then  shows  how  the  relations  of  this  abstract  graph  are 
used  to  chop  a  procedure  and  to  compute  abstractions  of  procedure  calls. 


2  The  Program  Dependence  Graph:  Relating  Statements 

The  program  dependence  graph  (PDG)  is  a  popular  representation  of  code  that  is  well  suited 
to  slicing  and  a  variety  of  other  program  manipulations.  Its  nodes  are  like  those  of  a  flow¬ 
chart:  one  for  each  primitive  statement  (such  as  an  assignment),  one  for  each  predicate  (at  the 
head  of  a  loop  or  if-statement),  and  special  entry  and  exit  nodes  for  the  program  as  a  whole. 
Its  edges,  however,  do  not  express  control  flow.  Instead,  an  edge  connects  one  node  to  an¬ 
other  when  the  execution  of  the  second  is  affected  directly  by  the  execution  of  the  first. 

A  simple  program  (taken  from  [RY89])  with  its  PDG  is  shown  in  Figure  1.  A  flowchart  for 
this  program  would  show  an  edge  from  s  *  0  to  x  =  1  because  of  their  syntactic  order,  but, 
since  their  executions  are  independent,  there  is  no  edge  connecting  them  in  the  PDG. 

There  are  two  kinds  of  edge  in  the  PDG.  The  solid  edges  are  dataflow  dependences;  an 
edge  from  i  to  /  indicates  that  some  variable  is  used  at  /  that,  on  some  path  through  the  con¬ 
trol-flow  graph,  was  last  defined  at  i.  The  edges  from  x=ltox  =  x+  2  and  from  x^x  +  1  to 
s  =  s  -I-  X,  for  example,  are  both  due  to  the  variable  x;  the  latter  is  said  to  be  “loop  carried”, 
since  the  connecting  path  goes  round  the  loop. 

The  dotted  edges  are  control  dependences;  an  edge  links  a  predicate  node  i  to  a  node  / 
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when  the  evaluation  of  the  predicate  at  /  immediately  controls  execution  of  the  node  /.  Note 
that  both  statements  of  the  loop’s  body  are  control  dependent  on  its  predicate. 

Entry  and  exit  nodes  may  be  treated  in  a  number  of  ways.  We  assume  that  all  variables  are 
defined  on  entry  and  used  on  exit.  This  extends  nicely  to  the  PDG  of  a  called  procedure,  in 
which  the  entry  node  stands  for  the  prior  definitions  of  variables,  and  the  exit  node  for  their 
subsequent  uses.  It  also  seems  more  natural  than  the  conventional  treatment  in  which  the 
entry  node  is  viewed  as  a  predicate  on  which  other  nodes  have  control  dependences 
[FOW87,  RY89]. 

Slices  are  easily  (but,  as  we  shall  see,  not  accurately)  calculated  from  the  PDG.  To  slice  the 
program  on  some  variable  defined  or  used  at  a  node,  one  simply  walks  back  over  the  PDG, 
marking  all  the  nodes  on  the  way,  and  then  deletes  from  the  program  text  the  statements  cor¬ 
responding  to  unmarked  nodes  [0084].  Weiser’s  original  algorithm  [Wei84,  LR87]  is  more 
general,  since  it  allows  slicing  on  variables  not  used  or  defined  at  the  given  line,  but  also  less 
efficient,  since  it  recomputes  the  dependences  for  each  new  slice. 

With  suitable  elaborations  (such  as  def-order  edges  and  true/false  labelling  of  control-de¬ 
pendence  edges),  the  PDG  may  be  regarded  as  a  complete  representation  of  the  program,  so 
that  two  programs  with  isomorphic  PDG’s  must  behave  equivalently  [HPR88,  CF89].  For 
our  purposes,  these  elaborations  are  not  relevant. 

Finally,  a  note  on  how  the  PDG  is  constructed.  The  first  step  is  to  obtain,  for  each  node  of 
the  control-flow  graph,  a  set  of  reaching  definitions  [ASU88]:  a  definition  of  variable  x  at 
node  i  reaches  a  node  /  if  there  is  a  path  from  i  to  ;  with  no  intervening  definition  of  x.  For 
each  reaching  definition  at  /  of  a  variable  that  is  actually  used  at a  dataflow  dependence  edge 
from  i  to  /  is  inserted.  The  control  dependence  edges  are  a  little  trickier,  requiring  the  calcu¬ 
lation  of  a  post-dominator  tree  (FOW87]. 

3  The  Abstract  PDG:  Relating  Variable  Instances 

Suppose  one  of  the  statements  in  our  program  is  a  call  to  a  procedure  whose  internal  struc¬ 
ture  is  of  no  interest.  We  would  still  like  to  construct  a  PDG  for  the  program  -  in  general, 
itself  a  procedure  -  that  takes  account  of  the  behaviour  of  the  called  procedure  without  e.x- 
plicitly  including  its  statements  as  a  subgraph. 

In  the  construction  of  the  standard  PDG,  the  only  details  of  a  statement  that  matter  are  the 
sets  of  variables  that  it  uses  and  defines.  Usually,  only  one  variable  is  defined  in  a  primitive 
statement,  and  a  dataflow  edge  leaving  the  corresponding  node  is  clearly  due  to  that  variable. 
For  example,  the  statement 

x  =  y  +  z 

has  a  definition  set  of  (x)  and  a  use  set  of  (y,  z};  if  there  is  a  dataflow  edge  from  this  node  to 
another  node,  it  must  be  because  that  node  uses  x. 

A  naive  extension  to  handle  procedure  call  would  simply  calculate  definition  and  use  sets 
for  the  body  of  the  procedure;  these  would  then  (with  appropriate  renaming  of  formals  to 
actuals)  play  the  role  of  definition  and  use  sets  for  the  call,  treated  as  a  primitive  statement. 
The  add  procedure  in  Figure  2,  for  instance,  would  be  given  a  definition  set  of  {s}  and  a  use 
set  of  {s,  /};  after  renaming  i  to  x,  the  call  add{x)  would  have  definitions  {s}  and  uses  (s,  x}, 
just  like  the  assignment  s  »  s  -»•  x. 

The  resulting  PDG  (to  the  right  in  Figure  2)  is  as  before,  with  add{x)  replacing  s  =  5  -l-  x. 
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Figure  2:  Case  in  which  naive  PDG  is  accurate 


program  Sum2 
procedure  add 

5-S+X 

X-JC+  I 
end 
s-0 

X—  1 

while  X  <  11  do 
add  0 
end 
end 


program  Sum2 
procedure  add 

41  s  *  s  +  x 

42  x»x  + 1 
end 

1  5*0 

2  X  *  I 

3  while  X  <  11  do 

4  add  0 
end 

end 


Figure  4:  Abstract  PDG  of  body  of  Sum! 
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■»  ud  relation 
.  du  relation 

Figure  S;  Tracing  back  a  use-def  edge  through  add's  def-use.s: 
use  ofxj  comes  from  def  ofx^  but  not  defofs^ 


Moreover,  any  slices  of  the  main  program  calculated  by  the  standard  algorithm  will  be  exactly 
as  if  we  first  inlined  the  procedure,  calculated  the  slice,  and  then  collapsed  the  body  back  to  a 
call. 

In  general,  though,  this  approach  will  produce  poor  results,  giving  slices  that  are  far  larger 
than  necessary.  To  see  why,  consider  another  variant  of  Sum  in  which  both  statements  of  the 
loop  body  are  moved  to  a  procedure  (Figure  3).  The  resulting  PDG  correctly  relates  the  state¬ 
ments,  there  being  an  edge  to  or  from  the  procedure  call  exactly  when  there  was  an  edge  in 
the  original  PDG  to  or  from  one  of  its  statements.  But  this  time  the  loss  of  structure  in  the 
procedure  call  affects  the  calculation  of  slices.  If  we  slice  on  the  use  of  x  in  the  predicate 
x<  will  obtain  all  the  statements  of  the  original  program,  including  the  irrelevants  =  0. 
Treating  the  call  to  add  as  a  single  node  spuriously  associates  the  definition  of  x  with  the  use 
of  s,  effectively  merging  paths  through  the  body  of  the  procedure  that  should  be  disjoint. 

To  avoid  this,  we  must  maintain  the  association  between  definitions  and  uses.  Every  node 
in  the  PDG  is  given,  in  place  of  a  definition  set  and  a  use  set,  a  binary  relation  on  variables  that 
contains  the  pair  (m,  t/)  when  u  is  defined  using  u.  Furthermore,  each  dataflow  dependence 
edge  is  labelled  with  the  variable  defined  at  its  source. 

The  add  procedure  of  SumZ  would  be  given  the  def-use  relation 

{(s,  x),  (s,  s),  (x,  x)} 

from  which  it  is  clear  that  a  definition  of  x  cannot  come  from  a  use  of  s.  The  edge  from  s  =  0 
to  add{)  is  labelled  s  and  the  edge  from  x  »  I  is  labelled  x  (Figure  4). 

A  new  slicing  algorithm  will  now  succeed  in  excluding  s  »  0  from  the  slice  of  x  at  x<  H. 
Instead  of  simply  tracing  backwards  along  edges  between  statement  nodes,  it  follows  the  uses 
and  definitions  of  variables.  It  will  determine  from  the  def-use  relation  of  add{)  that  the  defi¬ 
nition  of  X  is  due  to  a  use  of  x  alone,  and  will  find  the  node’s  predecessors  by  following  only 
edges  marked  x.  Figure  5  shows  the  relevant  def-use  associations  of  add{)  and  how  they  are 
connected  to  the  use-def  edges;  in  particular,  the  ou^oing  x  edges  are  not  connected  to  the 
incoming  s  edges. 
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4  Formal  Definition  of  the  Abstract  PDG 

The  conventional  PDG  relates  statements  with  edges  marked  only  to  distinguish  control  and 
dataflow  dependences,  so  it  can  be  modelled  as  two  binary  relations  on  statements.  The  argu¬ 
ment  of  the  last  section  suggests  that,  at  least  for  dataflow  dependences,  we  shall  need  to  label 
each  edge  with  the  variable  responsible  for  the  dependence. 

For  uniformity,  it  is  convenient  to  model  all  the  forms  of  dependence  as  relations  over  the 
same  set.  A  variable  instance  is  a  pair  consisting  of  a  variable  and  a  site: 

Instance  »  Var  X  Site 

A  site  is  just  a  node  in  the  graph,  but  it  also  corresponds  to  a  point  in  the  program  text.  An 
instance  is  not  the  same  as  a  syntactic  occurrence,  since  each  primitive  statement  or  proce¬ 
dure  call  occupies  one  site,  and  can  have  only  one  instance  per  variable,  however  many  times 
the  variable  appears.  We  shall  write  x,  for  the  instance  of  variable  x  at  site  i.  The  statement 

41 

for  example,  would  have  a  definition  of  s<,  and  uses  of  $4,  and  X4,.  The  purpose  of  this  scheme 
is  simply  to  label  variables  with  their  sites;  it  involves  no  precomputation  of  dependences  (in 
contrast  to  single  static  assignment  form  [C-t-91],  in  which  variables  are  labelled  so  that  uses 
match  their  corresponding  definitions). 

The  abstract  PDG  is  modelled  as  three  relations  on  instances: 

du,  ud,  cd:  Instance  **  Instance 

The  du  relation  holds  the  def-use  associations  of  the  individual  statements.  It  contains  the 
pair  (Xi,  y)  when  x  is  a  variable  defined  at  site  i  by  a  use  of  the  variable  y.  If  a  variable  is  de¬ 
fined  by  the  use  of  no  variable,  it  cannot  be  omitted  from  the  du  relation,  for  otherwise  it  will 
appear  not  to  be  defined  at  all.  So  we  introduce  a  dummy  variable  and  a  statement  such  as 
2  X  =  I 

will  contribute  (x.,  ±2)  to  the  du  relation.  Whenever  a  site  defines  a  variable,  we  add  a  depen¬ 
dence  on  another  special  variable  y;  the  pair  (x„  y.)  indicates  that  the  variable  x  is  defined  at 
i  (for  reasons  that  will  soon  be  clear).  Statement  4 1  above  would  thus  contribute  the  pairs 

(S4„  S4,),  (S4,4  X4,)  and  (54/,  y^i). 
i.  and  y  are  the  only  special  variables,  so 
Var  =  ProgramVariables  U  {±,y} 

The  ud  relation  models  the  dataflow  dependences  between  statements;  it  corresponds  to 
the  dataflow  dependence  relation  of  the  standard  PDG.  Its  pairs  are  of  the  form  (x„  x,),  where 
X  is  a  variable  used  at  site  i  and  defined  at  site  /'. 

Lastly,  the  cd  relation  models  control  dependences.  It  contains  the  pair  (y„  x,)  when  site  i 
has  a  control  dependence  on  site  /,  and  site ;  determines  the  flow  of  control  by  testing  variable 
X.  In  contrast  to  the  standard  PDG,  we  shall  not  distinguish  predicate  nodes  and  statement 
nodes.  A  node  may  both  have  side  effects  and  influence  the  flow  of  control.  This  uniformity 
simplifies  the  association  between  the  graph  and  the  program  text  for  languages  like  C  in 
which  side-effecting  conditionals  are  frequent.  The  fragment 

1  if{x++) 

2 

for  example,  would  not  require  two  separate  nodes  for  the  conditional  expression.  The  value 
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of  the  expression  tested  is  assigned  to  a  temporary  variable,  e  say,  so  that  statement  1  rontrib- 
utes  to  the  du  relation 

{e„  Xj),  (x„  X,),  (e„  y.)  and  (x„  y,) 
and  statement  2  contributes 
(y2,  Zi)  and  (y^,  yj). 

The  control  dependence  of  statement  2  is  then  expressed  by  the  pair 

(yi,  Ci) 

so  that  in  the  composite  if-statement  there  will  be  a  transitive  dependence  of  the  final  value  of 
yz  on  the  initial  value  of  Xi:  yz  on  y^  from  statement  2’s  contribution  to  du,  yz  on  e,  from  its 
contribution  to  cd,  and  e,  on  x,  from  statement  Ts  contribution  to  du. 

Like  the  standard  PDG,  the  abstract  PDG  has  special  entry  and  exit  sites.  All  variables  are 
defined  at  the  entry  and  used  at  the  exit: 

entry,  exit:  Site 

Var  X  {entry}  s  dom  du 

Var  X  {exit}  c  ran  du 

The  ud  and  cd  relations  for  the  program  Sum!  are  shown  in  Figure  4.  The  du  relation  is 
not  easy  to  display  graphically,  but  is  shown  in  part  for  the  add  call  in  Figure  5.  There  is  no 
labelling  of  cd  edges;  this  would  convey  no  extra  information  (and  for  a  compound  predicate, 
as  here,  would  show  control  dependence  on  a  temporary  variable). 

The  labelling  of  ud  edges  is  implicit  in  the  conventional  PDG,  since  it  can  be  inferred  from 
the  plain  edges  and  the  definition  and  use  sets  of  the  nodes.  The  novelty  of  the  abstract  PDG 
is  the  du  relation,  which,  having  yielded  the  definition  and  use  sets  of  the  nodes,  plays  no  roie 
in  the  construction  of  the  graph.  The  role  of  du  becomes  central,  however,  when  the  graph  is 
used.  The  next  section  explains  some  useful  closure  relations  based  on  ud,  cd  and  du-,  subse¬ 
quent  sections  show  how  they  are  used. 


4  Closures  of  the  Abstract  PDG 

Instead  of  giving  explicit  graph  traversal  algorithms,  we  shall  formulate  our  analysis  of  the 
abstract  PDG  with  the  relational  operators  of  Z  [Spi89],  summarized  in  Appendix  1.  This  has 
several  advantages.  First,  the  exposition  is  terser  and,  we  hope,  easier  to  understand.  Second, 
it  suggests  an  efficient  implementation  in  which  the  closure  relations  are  calculated  only  once 
(being  derivable  from  the  abstract  PDG),  and  a  variety  of  partial  representations  is  then  pro¬ 
duced  by  restricting  their  domain  and  range.  Third,  the  relational  expressions  are  less  biased 
than  explicit  traversal  algorithms,  and  might  be  used  to  justify  other  implementation  strate¬ 
gies. 

The  closure  relations  express  directly  the  transitive  dependence,  due  to  a  path  through  the 
graph,  of  a  variable  at  one  site  on  a  variable  at  another  site.  Since  the  difference  between  a 
control  dependence  and  a  dataflow  dependence  cannot  be  observed  at  the  endpoints  of  a 
path,  we  start  by  merging  the  ud  and  cd  relations  into  a  single  relation  ucd  expressing  all 
edges  between  nodes: 

ucd  ^udUed 

Four  closure  relations  arise  naturally.  The  first,  UD,  associates  a  use  of  some  instance  with  a 
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Figure  6;  How  basic  and  closure  edges  are  related 


definition  of  another  instance,  and  can  be  thought  of  as  an  abstraction  of  a  subgraph  into  a  set 
of  edges.  An  instance  a:,  is  related  by  UD  to  an  instance  y,  if  there  is  a  definition  of  y  at  site  / 
that  might  affect  the  value  of  x  where  it  is  used  at  site  i.  This  might  come  about  because  of  a 
direct  use-def  edge  (ucd),  or  because  of  a  path  with  two  edges  and  a  mediating  node  {ucd  »  du 
o  ucd),  or  a  path  of  three  edges  {ucd  o  du  o  ucd  o  du  o  ucd),  and  so  on,  giving 
UD  *  ucd  o  (du  o  ucd)* 

The  UD  relation  for  Sum2  (Figure  4),  for  example,  would  include  the  pair  (x„  s,),  indicating 
that  the  use  of  x  in  the  predicate  *<  1 1  may  be  affected  by  the  definition  of  s  in  s  =  0. 

The  second  relation,  DU,  abstracts  a  subgraph  into  a  set  of  nodes.  An  instance  x,  is  related 
by  DU  to  an  instance  y^  if  the  definition  of  x  at  i  can  be  affected  by  the  value  of  y  when  used 
at  /.  Again,  this  might  come  about  directly  or  along  a  path,  which  this  time  starts  and  ends 
with  a  node  rather  than  an  edge: 

DU  =‘du  o  {ucd  o  du)  * 

The  DU  relation  for  SutnZ  would  include  the  pair  (s^,  Xj),  since  the  definition  of  s  in  add{)  may 
be  affected  by  the  value  of  x  used  in  x  =  1. 

The  third  relation,  UU,  relates  uses  to  uses,  and  is  defined  in  terms  of  an  even  number  of 
hops: 

UU  =»  {ucd  o  du)* 

Finally,  DD  relates  definitions  to  definitions: 

DD  =  {du  o  ucd)* 

The  UU  relation  for  Sum2  (Figure  4)  would  contain  (xr„  x)  since  the  use  of  x  by  add{)  depends 
on  the  use  by  x<ll;  the  DD  relation  would  include  (s„  x.)  since  the  definition  of  s  by  add{) 
depends  on  the  definition  of  jc  by  x  =»  1.  Both  of  these  closures  are  symmetric,  and  include 
pairs  like  {x„  x^). 

The  relationship  between  the  basic  relations  and  the  closures  is  illustrated  in  Figure  6. 
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6  Chopping:  A  Generalization  of  Slicing 

Chopping  is  a  focusing  mechanism  like  slicing.  The  user  selects  two  sets  of  variable  instances, 
source  and  sink-,  chopping  then  yields  the  subgraph  that  shows  how  the  definitions  of  the  in¬ 
stances  in  source  can  affect  the  uses  of  the  instances  in  sink. 

The  subgraph  is  defined  as  a  subset  of  the  use-def  relation  ucd: 

ucd'  ■  UU(sink)  <  ucd  >  DD~{source) 

DD~(source),  the  image  of  source  under  the  inverse  of  DD,  is  the  set  of  definitions  that  might 
be  affected  by  the  definitions  in  source.  Similarly,  UU(sink),  the  image  of  the  set  sink  under 
UU,  is  the  set  of  uses  that  the  uses  in  sink  depend  on.  The  relevant  use-def  edges  are  those  that 
connect  a  use  in  UU(sink)  to  a  definition  in  DD'^{source).  These  are  obtained  by  restricting 
the  domain  of  ucd  to  the  former  and  its  range  to  the  latter. 

To  distinguish  dataflow  and  control-flow  dependences  in  the  subgraph,  it  may  be  com¬ 
puted  in  two  parts  which  are  then  superimposed;  first  the  dataflow  edges 

ud'  *  UUisink)  <ud>  DD'^(source) 
are  displayed  as  solid  lines,  and  then  the  control-flow  edges 
cd'  =  UUisink)  <cd>  DD~isource) 

which  are  shown  dotted.  The  relevant  edges  of  the  def-use  relation  may  be  found  easily  too: 
du'  =«  DD~(source)  <du>  UUisink) 

and  perhaps  displayed  as  in  Figure  5  (although  our  tool  does  not  currently  do  this). 

The  various  forms  of  slice  criteria  may  be  expressed  as  special  cases  of  chopping.  Reps’s 
criterion  [RY89]  identifies  all  the  sites  that  contribute  to  the  use  or  definition  of  some  variable 
V  at  site  i.  To  account  for  a  use  we  include  {(/,}  in  the  sink  set,  and  to  account  for  a  definition 
we  include  duiivi)).  The  equivalent  chopping  criterion  is  thus 

source  -  Var  X  Site,  sink  =  {v,}  U  du  ({v,}). 

Weiser’s  criterion  [Wei84]  identifies  all  sites  that  contribute  to  the  values  of  the  variables  in  a 
set  V  just  before  execution  of  a  statement  at  some  site  i.  So  long  as  the  variables  V  are  used  at 
site  i,  the  criterion  may  be  expressed  as 
source  —  Var  x  Site,  sink  =  V  x  {»}. 

Forward  slicing  [YL88]  marks  the  sites  that  are  subsequently  affected  by  a  definition  of  a 
variable  at  some  site.  For  a  variable  v  defined  at  site  i,  the  criterion  is 

source  =  {v,},  sink  =  Var  x  Site. 

Finally,  a  variant  of  slicing  proposed  for  use  in  maintenance  [GL91]  identifies  all  the  sites 
that  affect  the  final  value  of  a  variable,  or  a  definition  at  any  site.  For  a  variable  v,  this  requires 

sink  =s  duiiv)  X  Site)  U  {v„,},  source  =  Var  x  Site. 

Examples.  Figure  7  shows  the  results  of  applying  four  different  criteria  to  the  program  SumZ. 
To  see  how  the  final  value  of  x  is  determined,  we  slice  on  its  final  use: 

source  =  Var  x  Site,  sink  =  {x„,} 

The  result  (7a)  shows  not  only  which  sites  arc  included  but  also  why,  through  the  labelling  of 
edges.  The  absence  of  edges  marked  s,  for  instance,  shows  that  s  is  not  relevant  to  the  compu¬ 
tation  of  X  (even  though  add{)  uses  and  defines  s). 


tl 


program  Sum2 

procedure  add 

41  s  —  s+x 

42  X  »  X  +  1 
end 

1  s  -  0 

2  x»  1 

3  while  X  <  11  do 

4  add  0 
end 

end 

The  ;rrograw 


c:  Effect  of  initial  x  on  final  s  d:  Role  o/’add() 


Figures  7a-d:  Chopping  Sum2 
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To  see  the  effect  of  the  initialization  of  s,  we  forward  slice: 
source  »  {si},  sink  ■■  Var  X  Site 

The  result  (7b)  shows  add{)  again,  but  in  a  different  role;  this  time  its  use  and  definition  oix 
are  irrelevant.  The  loop  predicate  has  gone  too,  since  the  initial  value  of  s  does  not  affect  the 
number  of  iterations. 

Slicing  is  not  always  focused  enough.  If  we  want  to  know  how  the  initial  value  of  x  affects 
the  final  value  of  s,  we  would  select 
source  »  sink  =  {s„,} 

which  gives  a  subgraph  (7c)  that  is  smaller  than  the  slice 
source  *  Var  X  Site.,  sink  »■ 

Finally,  another  example  of  a  query  that  cannot  be  cast  as  a  slice  criterion:  to  understand 
how  add{)  behaves  in  the  loop,  we  might  select 

source  ■«  Var  X  {4},  sink  =  Var  x  {4} 

to  show  the  flows  that  originate  in  the  definitions  of  one  call  to  add  and  end  in  the  uses  of 
another  (7d). 


7  More  Precise  Slice  Criteria  and  Smaller  Slices 

The  discovery  that  slicing  reduces  to  a  simple  reachability  problem  in  the  PDG  [0084]  has 
made  both  slicing  and  the  PDG  more  popular.  But  it  has  aLo  led  to  a  reformulation  of  the 
slice  criterion,  weakening  its  precision  considerably.  The  abstract  PDG  supports  both  a  pre¬ 
cise  slice  criterion  and  a  simple  construction  method. 

Reps  and  his  colleagues  have  formalized  the  relationship  between  slicing  and  the  PDG 
[RY89].  They  define  the  slice  of  a  program  with  respect  to  a  program  point  p  and  variable  v  to 
be  all  the  statements  and  predicates  that  might  affect  the  value  of  v  at  point  p.  Like  us.  they  do 
not  permit  a  slice  to  be  taken  with  respect  to  an  arbitrary  variable,  but  require  that  v  is  used  or 
defined  at  p.  The  slice  is  constructed  by  tracing  backwards  in  the  PDG  from  the  node  corre¬ 
sponding  to  p.  Not  surprisingly,  the  variable  u  plays  no  role  in  this;  the  conventional  PDG, 
unlike  ours,  just  relates  statements.  The  slice  criterion  is  thus  really  a  node  in  the  PDG,  and, 
indeed,  is  formalized  in  exactly  this  way. 

As  a  result,  for  a  given  slice  criterion,  chopping  will  often  produce  better  results.  In  the 
program  Sum  (Figure  1),  for  example,  a  slice  on  the  use  of  x  in  the  statement  s  =  s  -t-  x  would 
spuriously  include  s~Q,  which  our  method  correctly  eliminates.  The  PDG  algorithm  traces 
back  along  all  the  edges  from  the  node  at  which  the  variable  is  used,  even  those  due  to  uses  of 
different  variables.  This  inaccuracy  becomes  disastrous  if  many  variables  are  used  at  one 
node,  which  is  why  Reps  is  forced  to  provide  a  separate  exit  node  for  each  program  variable. 
Otherwise,  slicing  on  a  final  use  of  any  variable  would  yield  the  entire  program. 

Chopping  not  only  gives  a  more  accurate  slice,  but  also  allows  a  more  precise  criterion. 
Suppose  we  want  to  see  the  code  that  affects  the  value  of  s  coming  into  statement  4,  s  =  s  -(-  x. 
The  PDG  formulation  of  slicing  does  not  distinguish  uses  and  definitions,  so  a  slice  on  s 
would  include  the  code  that  determines  x  too.  Our  method  does  not  suffer  from  this  prob¬ 
lem;  the  use  of  s  is  expressed  by 

source  »  Var  X  Site,  sink  =  {s^} 
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and  the  definition  by 

source  »  Var  x  Site,  sink  du{{s^}). 

Recall,  finally,  that  chopping  -  unlike  slicing  -  is  modular.  It  should  make  sense  to  slice  on 
a  variable  at  a  procedure  call  if  the  procedure  uses  that  variable.  Interprocedural  slicing  tech¬ 
niques  [HDC88,  HRB90,  Bin93],  however,  associate  with  the  call  node  only  the  uses  of  ac¬ 
tual  parameters.  If  the  variable  is  a  global,  the  user  must  find  the  first  use  in  the  procedure 
body  and  slice  on  that.  Our  abstract  PDG  gives  the  procedure  call  the  expected  dependences, 
so  that  the  call  node  uses  a  variable  when  there  is  a  use  in  the  body.  We  can  slice,  for  example 
on  the  use  of  x  in  the  call  add{)  of  Sum2  (Figure  4)  by  specifying  a  sink  of  {x^}; 
interprocedural  slicing  would  require  that  we  identify  instead  the  first  use  in  the  proce¬ 
dure  body. 

8  Forming  a  DU  Abstraction  of  a  Procedure 

The  abstract  PDG  models  a  procedure  call  at  a  site  i  as  a  subset  of  the  du  relation  consisting  of 
a  set  of  pairs  of  the  form  {x„  y,),  where  x  is  defined  (due  to  the  execution  of  the  procedure)  by 
a  use  of  y.  We  can  think  of  the  relation  on  variables 
Deps  s  Var  X  Var 

as  a  kind  of  dependency  specification  of  the  procedure,  which  we  now  show  how  to  construct 
from  the  abstract  PDG  of  its  body. 

All  variables  are  used  at  the  exit  and  defined  at  the  entry'.  Our  task  is  to  determine  the  def¬ 
use  relation  of  the  subgraph  that  lies  between  these  nodes. 

First  we  calculate  the  def-use  associations  between  variable  instances  that  span  the  sub¬ 
graph  by  restricting  the  DU  relation.  Its  domain  is  restricted  to  the  instances  whose  defini¬ 
tions  reach  the  exit  node,  and  its  range  is  restricted  to  the  instances  that  are  reached  by  defi¬ 
nitions  of  the  entry  node: 

Defuses  =  ud{Var  x  <  DU  >  ud~'(Var  x  {entry)) 

This  does  not  include  every  relevant  definition,  however,  because  a  variable  set  to  a  constant 
will  have  no  association  to  a  use  reached  by  the  entry  node.  The  instances  that  represent  defi¬ 
nitions  that  reach  the  exit,  excluding  the  dummy  definitions  at  the  entry,  are 
Defs  =  dom  {ud(Var  x  {exit})  <  DU)  \  {Var  x  [entry}). 

These  definitions  may  be  ambiguous  [ASU88],  since  there  may  be  paths  from  entry  to  exit  on 
which  they  do  not  occur.  To  distinguish  ambiguous  definitions,  we  collect  the  set  of  instances 
that  might  be  invariant,  whose  uses  at  the  exit  are  reached  by  their  definitions  at  the  entry: 
Invs  =  dom  {{Var  X  {exit})  <  UD  >  {Var  x  {entry})) 

From  the  relation  DefUses  and  the  sets  Defs  and  Invs  we  can  now  determine  the  depen¬ 
dency  specification.  Since  we  do  not  care  which  sites  are  associated  with  the  definitions  and 
uses,  we  project  onto  variables: 

DefUses,  =  {(«,  v)  \  3i,  j.  ((«,  i),  {v, ;))  €  DefUses} 

Defs,  =  dom  Defs 
Invs,  =  dom  Invs 

The  variables  that  are  defined  but  not  associated  with  uses  must  have  been  reset  to  constants: 


entry 


procedure  add 

41  s»5  +  ac 

42  X  -  X  +  J 

end 


Figure  8:  Abstract  PDG  for  add  procedure 


Resets^  —  Defsy  \  dom  DefUseSy 

The  ambiguously  defined  variables  are  those  that  are  both  potentially  invariant  and  poten¬ 
tially  defined: 

MaybeSy  —  InvSy  n  DefSy 

Finally,  the  dependency  relation  consists  of  the  def-use  associations,  dependences  of  resets  on 
the  constant  symbol,  and  self-dependences  of  ambiguously  defined  variables: 

Deps  «  DefUseSy  u  ResetSy  X  {J.}  U  {(w,m)  |  m  €  MaybeSy} 

Example.  The  add  procedure  of  Figure  4  has  the  abstract  PDG  shown  in  Figure  8.  Note  that, 
being  a  procedure  and  not  a  stand-alone  program,  it  has  edges  from  the  entry  node  represent¬ 
ing  flows  from  the  calling  context  into  the  procedure  body.  The  DU  closure  is  in  this  case 
identical  to  d«,  since  no  paths  from  a  use  to  a  definition  span  more  than  one  node.  The  re¬ 
strictions  reduce  it  to 

Defuses  =  {(s.,„  S4,),  {S4,,  X4,),  (x«,  x^;)}. 
and  the  resulting  dependence  relation  is 
Deps  =  {(s,  s),  (s,  x),  (x,  x)}. 

Were  x  =  x  +  I  to  be  replaced  by  x  »  I,  we  would  have 
Defuses  =  {(s^,,  S4i),  (s4„  x.,,)} 
and 

Defs  »  {S4u  X42} 
giving 

ResetSy  =  {x} 

and  a  dependence  relation  of 
Deps  *  {(s,  s),  (s,  x),  (x,  1)}. 

A  dependency  relation  is  constructed,  bottom-up,  for  each  procedure  in  the  program.  To 
find  the  contribution  of  a  procedure  call  to  the  du  relation  at  some  site  ;,  the  variables  of  the 
called  procedure’s  dependence  relation  are  renamed  to  match  formals  to  actuals;  they  are 


15 


instantiated  at  i;  and,  as  explained  in  Section  4,  related  in  addition  to  the  special  variable  y  as 
a  hook  for  control  dependences. 


9  The  Chopshop  Tool 

We  have  implemented  the  abstract  dependence  graph  and  chopping  algorithm  for  C  pro¬ 
grams.  The  tool  is  written  in  Standard  ML  of  New  Jersey.  It  compiles  the  C  code  into  an  inter¬ 
mediate  representation  from  which  a  conventional  PDG  is  constructed,  and  then  builds  the 
basic  and  closure  relations  of  the  abstract  PDG.  The  def-use  contributions  of  called  proce¬ 
dures  are  computed  when  required  and  cached.  The  tool  is  connected  by  a  Unix  pipe  to 
emacs,  so  that  the  mouseclicks  can  be  used  to  select  variables  and  sites.  The  result  of  chop¬ 
ping,  a  graph,  is  output  in  adjacency-list  form,  which  is  converted  to  a  postscript  picture  by 
dotty  and  previewed  with  ghostscript.  We  have  experimented  for  the  last  half-year  on  small 
programs  with  no  aliasing,  but  we  hope  to  extend  the  tool  so  that  it  can  handle  industrial  C 
code  by  the  end  of  the  year.  Performance  will  be  a  critical  issue,  but  we  see  no  reason  that  the 
abstract  PDG  should  be  fundamentally  less  tractable  than  the  standard  PDG.  Although  the 
closure  relations  are  potentially  enormous  (varying  with  the  squares  of  the  numbers  of  sites 
and  variables),  they  are  very  sparse  in  practice,  since  only  a  few  variables  are  typically  defined 
or  used  at  a  given  site. 


10  Related  Work 

Tools  that  analyze  relationships  between  program  components  have  been  around  for  some 
time.  CScope  [Ste85]  and  CIA  [CNR90],  for  example,  derive  relationships  from  the  abstract 
syntax  tree,  such  as  which  procedures  call  which,  and  respond  to  queries  in  a  relational  data¬ 
base  language.  Refine  [M-l-94,  Rea92]  can  present  this  kind  of  information  in  a  variety  of 
diagram  styles.  None  of  these  tools  do  any  serious  dataflow  analysis.  Refine/C,  for  example, 
can  show  when  one  procedure  uses  a  variable  defined  by  another,  but  considers  a  variable  to 
be  defined  by  a  procedure  whenever  it  occurs  on  the  left-hand  side  of  any  statement  in  the 
procedure  body. 

To  trim  derived  representations  of  large  programs,  various  mechanisms  have  been  devised. 
CodeBase  [Sel91]  organizes  CIA  output  with  syntactic  rules  expressing  notions  from  the  ap¬ 
plication  domain.  Rigi  [MK88,  M-l-92]  provides  various  aggregation  and  generalization 
mechanisms.  In  both  cases,  the  abstractions  arc  syntactic  and  specified  manually.  In  the  “inter¬ 
face  slicing”  of  [BE93],  the  user  selects  some  subset  of  the  operations  of  a  module,  and  the 
tool  uses  the  call  graph  to  eliminate  operations  that  are  dead  code. 

Only  a  few  slicing  tools  have  been  built  for  commercial  use,  and  most  of  these  are  for  Co- 
bol.  Andersen  Consulting’s  Cobol/SRE  [NEK93,  NEK94]  is  probably  the  most  complete:  it 
does  PDG-based  forward  and  backward  slicing,  as  well  as  “condition-based  slicing”  which  is 
really  not  slicing  at  all,  but  rather  a  kind  of  partial  evaluation.  Procedures  are  rarely  used  in 
Cobol,  so  interprocedural  analyses  arc  unnecessary;  perform  statements  are  handled  by  copy¬ 
ing  the  subgraph  of  the  performed  paragraph. 

Weiser’s  scheme  for  interprocedural  slicing  [Wei84]  gives  poor  results  because  it  fails  to 
distinguish  call  sites,  so  that  a  call  at  one  site  appears  to  be  able  to  return  to  another.  A  more 
accurate  analysis  based  on  the  PDG  has  been  invented  since  (HRB90,  Bin93].  The  essential 
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idea  is  to  construct  a  PDG  for  the  whole  system  with  special  nodes  for  procedure  call  and 
return,  along  with  an  attribute  grammar  to  keep  track  of  the  mapping  of  formals  to  actuals. 
The  resulting  slices  are  good  for  many  applications  but  less  appropriate  for  reverse  engineer¬ 
ing.  Unlike  our  approach,  the  slices  are  not  modular-,  a  procedure  call  is  a  link  to  another  part 
of  the  graph  rather  than  an  abstraction  barrier,  so  that  to  understand  why  a  procedure  call  is 
included  in  a  slice,  the  user  must  follow  the  dependency  associations  into  the  procedure  body. 
Chopshop,  by  labelling  the  ud  edges  in  and  out  of  a  procedure  call,  and  by  providing  its  du 
abstraction  on  request,  allows  it  to  be  understood  abstractly  in  context.  The  user  can,  of 
course,  follow  inside  the  body  too,  by  chopping  the  body  on  the  variables  entering  and  leav¬ 
ing  the  call. 

This  technique,  as  well  as  another  interprocedural  slicing  technique  [HDC88],  can  handle 
recursive  programs;  we  cannot  yet  do  this.  More  seriously,  we  do  not  handle  aliasing,  a 
thorny  problem  in  dataflow  analysis.  We  intend  to  incorporate  ideas  from  recent  work  such 
as  [HS94]  and  especially  [LR92,  LRZ93,  PLR94]  which  address  the  kinds  of  pointer  aliasing 
that  arise  in  C  programs. 

The  du  abstraction  originates  in  the  Aspect  specifications  of  our  previous  work  [Jac91, 
Jac92,  Jac93].  Relational  models  of  dependences  have  been  used  before,  but  this  paper  seems 
to  be  the  first  to  separate  the  use-def  relations  of  the  PDG  edges  from  the  def-use  relations  of 
the  nodes.  The  relations  underlying  the  Spade  tool  [BC85]  are  the  closest  in  spirit,  since  they 
associate  variables  and  expressions  at  particular  program  points.  They  are  also  used  to  extract 
“partial  statements”  which  seem  to  be  identical  to  Weiser’s  slices.  Like  our  previous  work,  the 
relations  are  defined  compositionally  over  the  syntax,  prohibiting  programs  with  arbitrary 
jumps.  Moreover,  a  fundamental  assumption  that  each  variable  occurrence  is  either  a  defini¬ 
tion  or  a  use  prevents  the  handling  of  procedure  calls  with  side  effects. 

Moriconi  and  Winkler’s  inference  rule  system  for  determining  the  scope  of  a  program 
change  [MW90]  also  defines  a  dependence  relation  inductively  over  the  syntax.  Procedure 
calls  are  abstracted,  since  the  proof  of  a  procedure  call’s  dependence  can  be  built  from  rules 
applied  to  its  body.  The  proof  of  a  dependence  is  a  kind  of  explanation  akin  to  the  labelling  of 
our  ud  edges.  Apparently  this  system  can  support  slicing  too;  it  would  be  interesting  to  see  if 
it  could  be  extended  to  more  general  chopping  criteria.  Its  main  disadvantage  is  performance, 
since  the  inference  system  evaluates  the  dependence  relation  pair-by-pair. 

Wilde  and  Huitt  discuss  “external  dependency  graphs”,  which  are  similar  to  our  du  ab¬ 
straction,  along  with  a  variety  of  other  dependence  relations  [WH91],  but  do  not  explain 
how  they  are  constructed  and  used. 

Theories  of  program  dependences  [CF89,  PC901  and  slicing  [RY89]  have  been  developed. 
We  hope  eventually  to  develop  a  theory  that  explains  our  dependence  relations  and  justifies 
an  operational  interpretation  of  chopping. 


1 1  Conclusion  and  Future  Work 

The  standard  PDG  provides  no  way  to  abstract  the  behaviour  of  a  procedure.  We  have  shown 
that  this  requires  enriching  the  notion  of  dependence  to  relate  not  just  sites  but  variable/site 
pairs.  Linking  the  definition  of  a  variable  at  one  site  with  a  subsequent  use  at  another  is  not 
enough;  we  must  also  make  explicit  the  dependences  of  a  site’s  outgoing  definitions  on  its 
incoming  uses  -  the  du  relation  of  our  abstract  PDG. 

The  abstract  PDG  supports  chopping,  a  novel  focusing  mechanism  that  is  both  more  flex- 


17 


ibie  than  program  slicing  and  more  accurate,  even  on  standard  slice  criteria.  Chopping  is 
modular,  so  that  a  procedure  call  can  be  treated  as  a  simple  statement,  whose  inclusion  can  be 
justified  with  the  du  relation  and  explained  to  the  user  by  the  labelling  of  edges. 
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Appendix  1:  Relational  Operators 

The  paper  uses  the  following  operators  on  sets  and  relations.  Throughout,  s  and  t  are  sets  of 
elements  of  type  T,  and  p  and  q  are  binary  relations  on  T. 

s  \  /  »  {e:s  \  e  i  t} 
l^{(t,t)\t:T} 

dom  p  ■  {a:T  \  3b:T.  (a,  b)  €  p} 

ran  p  ■  {b:T  1  3a:T.  (a,  b)  €  p} 

p~  »  {{b,  a)  I  {a,  b)  €  p} 

p  {(a,  b)  I  3z:T.  (a,  z)  €  p  A  (z,  b)  €  q) 

s<p^  {{a,  b)€p  1  a  €  s} 

p  >  s  »  {(a,  b)€p  I  b  €  s} 

pis)  *  [b  I  3a:s.  (a,  b)  €  p} 

The  reflexive  and  transitive  closure  of  p,  written  p*,  is  the  smallest  relation  containing  p  that 
is  reflexive  (/  s  p*)  and  transitive  (p*  o  p*  c  p*).  It  may  be  calculated  iteratively  as: 
p*  =  /  U  p  U  (p  O  p)  u  (p  0  P  o  p)  U  ... 

Appendix  2:  Summary  of  Definitions 

Definition  of  abstract  PDG: 

Var  =  ProgramVariables  U  {X,y} 

Instance  =  Var  X  Site 
entry,  exit:  Site 

du,  ud,  cd:  Instance  *-►  Instance 
Var  X  {entry}  s  dom  du 
Var  X  {exit}  s  ran  du 
ucd  =  ud\J  cd 

Closure  relations: 

UD  =  ucd  o  {du  o  ucd)  * 

DU  =*  dw  o  {ucd  o  du)* 

UU  =»  {ucd  o  du)* 

DD  »  {du  o  ucd)* 
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Chop  from  source  to  sink: 
ud'  -  UU(sink)  <ud>  DD~(source) 
cd'  ■  UU{smk)  <cd>  DD~{source) 
du'  —  Diy“{source)  <du>  UU{sink) 

Forming  DU  abstraction  of  procedure 

1.  Determine  last  defs  and  first  ttses 

DefUses  «■  udiyar  x  (exit})  <  DU  >  ud"“(Var  x  {entry} 

2.  Determine  instances  potentially  defined  or  invariant: 

Defs  “  dom  {ud{Var  x  {exit})  <  DU)  \  (Var  x  {entry}) 

Invs  *  dom  ((Var  X  {exit})  <  UD  >  (Var  X  {entry})) 

3.  Project  onto  variables 

Defuses,  -  {(«,  v)  \  3<, ((u,  i),  (v,  /))  €  DefUses} 

Defs,  -  dom  Defs 
Invs,  *  dom  Invs 

4.  Obtain  dependence  relation  Deps 

Resets,  -  Defs,  \  dom  DefUses, 

Maybes, » Invs,  n  Defs, 

Deps  «■  DefUses,  U  Resets,  x  {±}  u  {(m,m)  |  m  €  Maybes,} 
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